Skunkware 5

home *** CD-ROM | disk | FTP | other *** search

/ Skunkware 5 / Skunkware 5.iso / man / cat.1 / ispell.1 < prev next >

Wrap

Text File | 1995-07-25 | 20KB | 397 lines

IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((MMMMIIIITTTT)))) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) NNNNAAAAMMMMEEEE ispell - Correct spelling for a file munchlist - Combine suffixes in a spelling list isexpand - Expand suffixes in a spelling list SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS iiiissssppppeeeellllllll [ ----tttt | ----xxxx | ----SSSS | ----dddd file | ----pppp file | ----wwww chars ] file ..... iiiissssppppeeeellllllll [ ----tttt | ----dddd file | ----pppp file | ----wwww chars ] ----llll iiiissssppppeeeellllllll [ ----tttt | ----dddd file | ----pppp file ] { ----aaaa | ----AAAA } iiiissssppppeeeellllllll [ ----wwww chars ] ----cccc iiiissssppppeeeellllllll ----vvvv mmmmuuuunnnncccchhhhlllliiiisssstttt [ ----dddd file | ----eeee | ----wwww chars ] [ files ] iiiisssseeeexxxxppppaaaannnndddd [ files ] DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN _I_s_p_e_l_l is fashioned after the _s_p_e_l_l program from ITS (called _i_s_p_e_l_l on Twenex systems.) The most common usage is "ispell filename". In this case, _i_s_p_e_l_l will display each word which does not appear in the dictionary, and allow you to change it. If there are "near misses" in the dictionary (words which differ by only a single letter, a missing or extra letter, or a pair of transposed letters), then they are also displayed. If you think the word is correct as it stands, you can type either "Space" to accept it this one time, or "I" to accept it and put it in your private dictionary. If one of the near misses is the word you want, type the corresponding number. (If there are more than 10 choices, you may have to type a carriage return to complete a single-digit number). Finally, if none of these choices is right, you can type "R" and you will be prompted for a replacement word. If you want to see a list of words that might be close using wildcard characters, type "L" to lookup a word in the system dictionary. When a misspelled word is found, it is printed at the top of the screen. Any near misses will be printed on the following lines, and finally, two lines containing the word are printed at the bottom of the screen. If your terminal can type in reverse video, the word itself is highlighted. The ----vvvv option causes _i_s_p_e_l_l to print its current version identification on the standard output and exit. The ----llll or "list" option to _i_s_p_e_l_l is used to produce a list of misspelled words from the standard input. The ----aaaa option is intended to be used from other programs through a pipe. In this mode, _i_s_p_e_l_l expects the standard input to consist of lines containing single words. Each word is read, and a single line is written to the standard output. If the word was found in the main dictionary, or Page 1 (printed 3/9/94) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((MMMMIIIITTTT)))) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) your personal dictionary, then the line contains only a '*'. If the word was found through suffix removal, then the line contains a '+', a space, and the root word. If the word is not in the dictionary, but there are near misses, then the line contains an '&', a space, and a list of the near misses separated by spaces. Also, each near miss is capitalized the same as the input word if unless such capitalization is illegal; in the latter case each near miss is capitalized correctly according to the dictionary. Finally, if the word neither appears in the dictionary, and there are no near misses, then the line contains only a '#'. This mode is also suitable for interactive use when you want to figure out the spelling of a single word. (These characters are the same as the codes that the real spell program uses.) The ----AAAA option works just like ----aaaa, except that if a line begins with the string "&Include_File&", the rest of the line is taken as the name of a file to read for further words. Input returns to the original file when the include file is exhausted. Inclusion may be nested up to five deep. The key string may be changed with the environment variable IIIINNNNCCCCLLLLUUUUDDDDEEEE____SSSSTTTTRRRRIIIINNNNGGGG (the ampersands, if any, must be included). When in the ----aaaa mode, _i_s_p_e_l_l will also accept lines of single words prefixed with either a '*' or a '@'. A line starting with '*' tells _i_s_p_e_l_l to insert the word into the user's dictionary (similar to the I command). A line starting with '@' causes _i_s_p_e_l_l to accept this word in the future (similar to the A command). The ----xxxx option causes _i_s_p_e_l_l to remove the .bak file that it normally leaves. The .bak file contains the pre-corrected text. If there are file opening / writing errors, the .bak file may be left for recovery purposes even with the -x option. The ----SSSS option suppresses _i_s_p_e_l_l's normal behavior of sorting the list of possible replacement words. Some people may prefer this, since it somewhat enhances the probability that the correct word will be low-numbered. The ----tttt option selects TeX/LaTeX input mode. TeX/LaTeX mode is also automatically selected if an input file has the extension ".tex". In this mode, whenever a backslash ("\") is found, _i_s_p_e_l_l will skip to the next whitespace. Thus, for example, given \chapter {This is a Ckapter} \cite{SCH86} will find "Ckapter" but will not look for SCH. The ----tttt option does not recognize the TeX comment character "%". The ----dddd option is used to specify an alternate hashed dictionary file, other than the default. If the filename Page 2 (printed 3/9/94) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((MMMMIIIITTTT)))) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) does not begin with a "/", the library directory for the default dictionary file is prefixed. This is useful to allow dictionaries which prefer alternate British spellings ("centre", "tyre", etc), or add lists of special-purpose jargon and acronyms for subclasses of documents. There are some shortcomings in attempting to provide foreign-language dictionaries, but something like "-d french" could be made to work somewhat. The ----dddd option may specify /_d_e_v/_n_u_l_l, in which case the dictionary is limited to the personal one. This may be useful for certain private dictionaries. The ----pppp option is used to specify an alternate personal dictionary file. If the file name does not begin with "/", $HOME is prefixed. Also, the shell variable WORDLIST may be set, which renames the personal dictionary in the same manner. The command line overrides WORDLIST setting. If neither is present "~/.ispell_words" is used. The ----wwww option may be used to specify characters other than alphabetics which may also appear in words. For instance, ----wwww "&" will allow "AT&T" to be picked up. Underscores are useful in many technical documents. There is an admittedly crude provision in this option for 8-bit international characters. Non-printing characters may be specified in the usual way by inserting a backslash followed by the octal character code; e.g., "\014" for a form feed. Alternatively, if "n" appears in the character string, the (up to) three characters following are a DECIMAL code 0 - 255, for the character. For example, to include bells and form feeds in your words (an admittedly silly thing to do, but aren't most pedagogical examples): n007n012 Numeric digits other than the three following "n" are simply numeric characters. Use of "n" does not conflict with anything because actual alphabetics have no meaning - alphabetics are already accepted. _I_s_p_e_l_l will typically be used with input from a file, meaning that preserving parity for possible 8 bit characters from the input text is OK. If you specify the -l option, and actually type text from the terminal, this may create problems if your stty settings preserve parity. The ----cccc option is primarily intended for use by the _m_u_n_c_h_l_i_s_t shell script. In this mode, a list of words is read from the standard input. For each word, a list of possible root words and suffixes will be written to the standard output. Some of the root words will be illegal and must be filtered from the output by other means; the _m_u_n_c_h_l_i_s_t script does this. As an example, the command "echo BOTHER | ispell -c" produces: Page 3 (printed 3/9/94) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((MMMMIIIITTTT)))) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) BOTH BOTHE/R BOTH/R Unless it has been installed without the feature by your system administrator, _i_s_p_e_l_l is aware of the correct capitalizations of words in the dictionary and in your personal dictionary. As well as recognizing words that must be capitalized (e.g., George) and words that must be all- capitals (e.g., NASA), it can also handle words with "unusual" capitalization (e.g., "ITCorp" or "TeX"). If a word is capitalized incorrectly, the list of possibilities will include all acceptable capitalizations. (More than one capitalization may be acceptable; for example, my dictionary lists both "ITCorp" and "ITcorp".) Normally, this feature will not cause you surprises, but there is one circumstance you need to be aware of. If you add a word to your dictionary that is at the beginning of a sentence (e.g., the first word of this paragraph if "unless" were not in the dictionary), it will be marked as "capitalization required". A subsequent usage of this word without capitalization (e.g., the quoted word in the previous sentence), _i_s_p_e_l_l will object and suggest the capitalized version. You must then compare the actual spellings by eye, and then type "I" to add the un-capitalized variant to your personal dictionary. The rules for capitalization are as follows: (1) Any word may appear in all capitals, as in headings. (2) Any word that is in the dictionary in all-lowercase form may appear either in lowercase or capitalized (as at the beginning of a sentence). (3) Any word that has "funny" capitalization (i.e., it contains both cases and there is an uppercase character besides the first) must appear exactly as in the dictionary, except as permitted by rule (1). If the word is acceptable in all-lowercase, it must appear thus in a dictionary entry. The _m_u_n_c_h_l_i_s_t shell script is used to reduce the size of dictionary files, primarily personal dictionary files. It is also capable of combining dictionaries from various sources. The given _f_i_l_e_s are read (standard input if no arguments are given), reduced to a minimal set of roots and suffixes that will match the same list of words, and written to standard output. Normally, words that are in the default dictionary are removed by _m_u_n_c_h_l_i_s_t during processing. If the list is to Page 4 (printed 3/9/94) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((MMMMIIIITTTT)))) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) be used with a different dictionary, the ----dddd option can be used to specify an alternate (hashed) dictionary file containing words to be removed from the output list. If a dictionary file of /_d_e_v/_n_u_l_l is specified, no words will be removed from the output; this is useful when munching the primary dictionary file. The ----wwww option is passed on to _i_s_p_e_l_l. The ----eeee ("efficient") option causes the script to use a slower algorithm that uses somewhat less space in TMPDIR (normally /_u_s_r/_t_m_p). The _i_s_e_x_p_a_n_d shell script is used to expand the various suffix flags in an _i_s_p_e_l_l word list. This script can be used when looking words up in the dictionary, or to verify that a particular suffix flag actually produces the expected result. It is possible to install _i_s_p_e_l_l in such a way as to only support ASCII range text if desired. EEEENNNNVVVVIIIIRRRROOOONNNNMMMMEEEENNNNTTTT WORDLIST Personal dictionary file name INCLUDE_STRING Code for file inclusion under the -A option TMPDIR Directory used for some of munchlist's temporary files FFFFIIIILLLLEEEESSSS $HOME/.ispell_words user's private dictionary /usr/dict/words list of words for the Lookup function /_t_o_o_l_s/_s_o_u_r_c_e_s/_i_s_p_e_l_l directory for the following files: ispell.hash hashed dictionary for ispell isexp[1-4].sed sed scripts for expanding suffixes icombine program for combining suffix flags munchlist munchlist program isexpand isexpand program makedict* making your own dictionaries fixdict fix capitalization in dictionary ispell.4.hlp description of dictionary entries ispell.el sample GNU Emacs interface to ispell SSSSEEEEEEEE AAAALLLLSSSSOOOO spell(1), egrep(1), look(1), ispell(4) BBBBUUUUGGGGSSSS It takes about five seconds for _i_s_p_e_l_l to read in the hash table. The hash table is stored as a quarter-megabyte (or larger) array, so a PDP-11 version does not seem likely. Page 5 (printed 3/9/94) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) UUUUNNNNIIIIXXXX SSSSyyyysssstttteeeemmmm VVVV ((((MMMMIIIITTTT)))) IIIISSSSPPPPEEEELLLLLLLL((((llllooooccccaaaallll)))) _I_s_p_e_l_l should understand more _t_r_o_f_f syntax, and deal more intelligently with contractions. While alternate dictionaries for foreign languages could be defined, and the international characters included in words, rules concerning word endings / pluralization accommodate English only. When the ----xxxx flag is specified, _i_s_p_e_l_l will unlink any existing .bak file. _M_u_n_c_h_l_i_s_t requires tremendous amounts of temporary file space for large dictionaries. It does respect the TMPDIR environment variable, so this space can be redirected. However, a lot of the temporary space it needs is for sorting, so TMPDIR is only a partial help on systems with an uncooperative _s_o_r_t(1). As a benchmark, the 15000-word _d_i_c_t._1_9_1 takes about 1200 blocks in TMPDIR, and 2000 in _s_o_r_t's temporary directories. Munching _d_i_c_t._1_9_1 with /_u_s_r/_d_i_c_t/_w_o_r_d_s (28000 words output) took another 1500 blocks or so, and ran for the better part of an hour. AAAAUUUUTTTTHHHHOOOORRRR Pace Willisson (pace@mit-vax) Collected, revised, and enhanced for the Usenet by Walt Buehring. Further enhanced and debugged by Isaac Balbin, Stewart Clamen, Mark Davies, Steve Dum, Gary Johnson, Don Kark, Steve Kelem, Jim Knutson, Geoff Kuenning, Evan Marcus, Dave Mason, Rob McMahon, Bob McQueer, David Neves, Joe Orost, Israel Pinkas, Gary Puckering, Bill Randle, Marc Ries, Rich Salz, Greg Schaffer, Joel Shprentz, George Sipe, Perry Smith, Stefan Taxhet, Andrew Vignaux, Johan Widen, James Woods, and Ken Yap. Page 6 (printed 3/9/94)